-
Notifications
You must be signed in to change notification settings - Fork 7
feat(BRE2-661): Add Support for San Francisco Compute #87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* add scaffolding, and client authentication.
* return APITypeGlobal from GetAPIType function, as SFC accounts are not tied to specific regions.
* fix apiKey in SFCCredential struct
* scaffolding for instance.go
* add instance creation implementation with SSH key support
* add function to map the status of a node reported from SFC API to v1.LifecycleStatus in Brev
* implement GetInstance in sfcompute with node data retrieval inluding SSH Hostname / Public IP
* add TerminateInstance implementation with node release and delete logic
* set default SSH port to 2222, as is standard for our platform
* implement GetSSHHostname for retrieving the SSH hostname of an instance in sfcompute
* remove unneeded call to api for GetSSHHostname
* use VM ID instead of instance ID to retrieve SSH Hostname
* remove get ssh hostname function
* implement ListInstances
* add validation test for sfcompute with API key check and skip logic
* add getInstanceTypeID method for generating instance type IDs in sfcompute
* bump sfcnodes version to v0.1.0-alpha.4 which adds support for the /v0/zones endpoint
* implement GetLocations
* only return approved zones
* only return regions that have more than zero capacity instead of any zones that have capacity not equal to zero, in case the availability ever returns a negative number
* update location description to include formatted hardware type information. example: `sfc_hayesvalley_h100`
* return unavailable regions with v1.Location{Available: false}
* fix an error where a nil map was returned
* start implementing GetInstanceTypes
* fix tests failing due to ValidateRegionalInstanceTypes and ValidateStableInstanceTypeIDs fails errors
* set MaxPricePerNodePerHour to 1600 ($16/node/h, $2/gpu/h)
* add regions excelsior and yerba
* remove region excelsior
894ca32 to
29104ff
Compare
| t.Fatalf("failed to make client: %v", err) | ||
| } | ||
|
|
||
| instance, err := client.GetInstance(context.Background(), "6c7a3ade-1e59-4e04-af6e-365046995a81_test") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's with the hardcoded id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah these are basically test tasks, so this was left over from testing.
| ) | ||
|
|
||
| const ( | ||
| maxPricePerNodeHour = 1600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this 1600 dollars?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cents
| } | ||
|
|
||
| func sshKeyCloudInit(sshKey string) string { | ||
| script := fmt.Sprintf("#cloud-config\nssh_authorized_keys:\n - %s", sshKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to double check for ufw, or does sfcompute have firewalls external?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah SFC essentially only has port 2222 open at an infrastructural level.
v1/providers/sfcompute/instance.go
Outdated
|
|
||
| func sfcStatusToLifecycleStatus(status string) v1.LifecycleStatus { | ||
| switch strings.ToLower(status) { | ||
| case "pending", "nodefailure", "unspecified", "awaitingcapacity", "unknown", "failed": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failed/nodefailure will sit in pending and then our 30 minute timeout will hit and turn these to failed, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch I will adjust these to map to LifecycleStatusFailed
| diskTypeSSD = "ssd" | ||
| ) | ||
|
|
||
| var allowedZones = []string{"hayesvalley", "yerba"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the zones have an "enabled" or "available" so we don't have to hardcode them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They have an "enabled" but we also want to explicitly exclude zones (per SFC's suggestion out of band).
No description provided.